Goto

Collaborating Authors

 generate caption



A Neural Compositional Paradigm for Image Captioning

Bo Dai, Sanja Fidler, Dahua Lin

Neural Information Processing Systems

Mainstream captioning models often follow a sequential structure to generate captions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance.


Warner Bros. Discovery teams up with Google to generate captions using AI

Engadget

Warner Bros. Discovery (WBD) has agreed a deal with Google Cloud to use the latter's Vertex AI to generate captions for programming across a variety of platforms. WBD claims that its Caption AI system can significantly reduce production time and costs while improving the accuracy of captions for US-based viewers. The tech will be used for unscripted programming at the outset, which could include news, sports and reality TV across the likes of Max, CNN and Discovery . WBD claims the system can reduce the time it takes to create captions by up to 80 percent and captioning costs by up to 50 percent. There will still be a level of human review for quality assurance, and the company claims this approach will help refine and train Caption AI's workflow to improve it over time.

  Industry:

CLIP for Language-Image Representation

#artificialintelligence

Have you ever wondered how machines can understand the meaning behind a photograph? CLIP, the Contrastive Language-Image Pre-training model, is changing the game of image-language understanding. In this post, we will explore why CLIP is so stunning with its ability. We have seen AI's potential to solve many problems in our world. The famous AI models such as ChatGPT, LLaMA, or DALLE, etc., changing our lives (In a good way, I suppose) are direct evidence.


A system to produce context-aware captions for news images

#artificialintelligence

Computer systems that can automatically generate image captions have been around for several years. While many of these techniques perform considerably well, the captions they produce are typically generic and somewhat uninteresting, containing simple descriptions such as "a dog is barking" or "a man is sitting on a bench." Alasdair Tran, Alexander Mathews and Lexing Xie at the Australian National University have been trying to develop new systems that can generate more sophisticated and descriptive image captions. In a paper recently pre-published on arXiv, they introduced an automatic captioning system for news images that takes the general context behind an image into account while generating new captions. The goal of their study was to enable the creation of captions that are more detailed and more closely resemble those written by humans.


How to Generate Text from Images with Python

#artificialintelligence

In the Google Search: State of the Union last May, John Mueller and Martin Splitt spent about a fourth of the address to image-related topics. They announced a big list of improvements to Google Image Search and predicted that it would be a massive untapped opportunity for SEO. SEO Clarity, an SEO tool vendor, released a very interesting report around the same time. Among other findings, they found that more than a third of web search results include images. Images are important to search visitors not only because they are visually more attractive than text, but they also convey context instantly that would require a lot more time when reading text.


A Neural Compositional Paradigm for Image Captioning

Dai, Bo, Fidler, Sanja, Lin, Dahua

Neural Information Processing Systems

Mainstream captioning models often follow a sequential structure to generate cap- tions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance. In this paper, we present an alternative paradigm for image captioning, which factorizes the captioning procedure into two stages: (1) extracting an explicit semantic representation from the given image; and (2) constructing the caption based on a recursive compositional procedure in a bottom-up manner. Compared to conventional ones, our paradigm better preserves the semantic content through an explicit factorization of semantics and syntax. By using the compositional generation procedure, caption construction follows a recursive structure, which naturally fits the properties of human language. Moreover, the proposed compositional procedure requires less data to train, generalizes better, and yields more diverse captions.


A Neural Compositional Paradigm for Image Captioning

Dai, Bo, Fidler, Sanja, Lin, Dahua

Neural Information Processing Systems

Mainstream captioning models often follow a sequential structure to generate cap- tions, leading to issues such as introduction of irrelevant semantics, lack of diversity in the generated captions, and inadequate generalization performance. In this paper, we present an alternative paradigm for image captioning, which factorizes the captioning procedure into two stages: (1) extracting an explicit semantic representation from the given image; and (2) constructing the caption based on a recursive compositional procedure in a bottom-up manner. Compared to conventional ones, our paradigm better preserves the semantic content through an explicit factorization of semantics and syntax. By using the compositional generation procedure, caption construction follows a recursive structure, which naturally fits the properties of human language. Moreover, the proposed compositional procedure requires less data to train, generalizes better, and yields more diverse captions.


Building an image caption generator with Deep Learning in Tensorflow

#artificialintelligence

In my last tutorial, you learned how to create a facial recognition pipeline in Tensorflow with convolutional neural networks. In this tutorial, you'll learn how a convolutional neural network (CNN) and Long Short Term Memory (LSTM) can be combined to create an image caption generator and generate captions for your own images. In 2014, researchers from Google released a paper, Show And Tell: A Neural Image Caption Generator. At the time, this architecture was state-of-the-art on the MSCOCO dataset. It utilized a CNN LSTM to take an image as input and output a caption.


Japanese researchers create mind-reading A.I. that can transcribe a person's thoughts

#artificialintelligence

By now, you probably already know that artificial intelligence (AI) technology is developing at a remarkably fast rate. It has been the subject of many essays and news articles that are mainly about an oncoming robot takeover, which could mean trouble for many humans in the world today. But in order for advanced AI robots to truly take over the world, they've got to first be able to think like humans. Now a group of researchers have taken the first steps to making that a reality. Japanese researchers recently published a new study titled, "Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli," wherein AI technology was used to predict – with great accuracy – exactly what people were thinking when they were looking at certain pictures.